EHN accept one-vs-all encoding for labels #410

glemaitre · 2018-03-01T11:05:07Z

This PR allows user to provide targets which are one-vs-all encoded.
This is widely used in Keras for the loss function.

This is part of #409

glemaitre · 2018-03-01T14:43:04Z

@massich @mrastgoo @chkoar This is to support one-vs-all encoding targets.
This is motivated by the fact that in keras a user will one-hot-encode the target to provide it to the loss function. In #409, we will greatly benefit to manage automatically this case by converting the problem to multiclass to make the under-/over-sampling before to convert it back.

It is ready for review or at least some comment regarding the internal changes.

codecov · 2018-03-01T15:08:18Z

Codecov Report

Merging #410 into master will decrease coverage by <.01%.
The diff coverage is 98.36%.

@@            Coverage Diff             @@
##           master     #410      +/-   ##
==========================================
- Coverage   98.78%   98.77%   -0.01%     
==========================================
  Files          68       68              
  Lines        3961     4014      +53     
==========================================
+ Hits         3913     3965      +52     
- Misses         48       49       +1

Impacted Files	Coverage Δ
imblearn/utils/validation.py	`100% <100%> (ø)`	⬆️
imblearn/combine/smote_enn.py	`100% <100%> (ø)`	⬆️
imblearn/combine/smote_tomek.py	`100% <100%> (ø)`	⬆️
imblearn/utils/estimator_checks.py	`96.21% <100%> (+0.38%)`	⬆️
imblearn/utils/tests/test_validation.py	`100% <100%> (ø)`	⬆️
imblearn/ensemble/balance_cascade.py	`100% <100%> (ø)`	⬆️
imblearn/ensemble/base.py	`95.45% <94.73%> (-4.55%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23ad602...dae94b9. Read the comment docs.

massich · 2018-03-19T13:40:27Z

imblearn/ensemble/base.py

+        Returns
+        -------
+        X_resampled : {ndarray, sparse matrix}, shape \
+(n_subset, n_samples_new, n_features)


massich · 2018-03-19T13:41:26Z

LGTM

glemaitre · 2018-03-19T13:42:32Z

Check the doc for the reason of not indenting.

…

On 19 March 2018 at 14:40, Joan Massich ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In imblearn/ensemble/base.py <#410 (comment)> : > + + def sample(self, X, y): + """Resample the dataset. + + Parameters + ---------- + X : {array-like, sparse matrix}, shape (n_samples, n_features) + Matrix containing the data which have to be sampled. + + y : array-like, shape (n_samples,) + Corresponding label for each sample in X. + + Returns + ------- + X_resampled : {ndarray, sparse matrix}, shape \ +(n_subset, n_samples_new, n_features) indenting — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#410 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHG9PyfVwQgJ_7i16A2_6KAXiQuAKqfjks5tf7VMgaJpZM4SYISL> .

-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/

glemaitre · 2018-03-19T17:44:45Z

imblearn/__init__.py

@@ -13,6 +13,9 @@
 exceptions
    Module including custom warnings and error clases used across
    imbalanced-learn.
+keras


Should not be there

This PR attend to provide some utilities for keras: - [x] support for one-vs-all encoded targets (#410) - [x] balanced batch generator TODO: - [x] Add common test to check multiclass == multilabel-indicator (#410) - [x] Manage the specificity of the EasyEnsemble and BalanceCascade (overwrite `sample`) - [x] Add user guide documentation - [x] Add an example for simple use - [x] Add an example for deep training - [x] Add substitution - [x] What's new - [x] Optional depencies

glemaitre added 6 commits March 1, 2018 02:03

EHN accept one-vs-all targets

dae3ba3

TST add test for check_target_type

7487ce4

PEP8

05ae2e6

TST fix pytests match warns

05b7d65

TST common test to check multiclass ova equality

1a27e3e

FIX/TST ensemble handle one-vs-all encoding

f5a583f

glemaitre mentioned this pull request Mar 1, 2018

EHN add support for some Keras utilities #409

Merged

10 tasks

massich reviewed Mar 19, 2018

View reviewed changes

glemaitre commented Mar 19, 2018

View reviewed changes

glemaitre added 3 commits March 19, 2018 18:46

DOC remove wrong documentation module

1435b35

add whats new entry

6350718

correct issue

dae94b9

glemaitre merged commit 24f4973 into scikit-learn-contrib:master Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EHN accept one-vs-all encoding for labels #410

EHN accept one-vs-all encoding for labels #410

Uh oh!

glemaitre commented Mar 1, 2018

Uh oh!

glemaitre commented Mar 1, 2018

Uh oh!

codecov bot commented Mar 1, 2018 •

edited

Loading

Uh oh!

massich Mar 19, 2018

Uh oh!

massich commented Mar 19, 2018

Uh oh!

glemaitre commented Mar 19, 2018 via email

Uh oh!

glemaitre Mar 19, 2018

Uh oh!

Uh oh!

EHN accept one-vs-all encoding for labels #410

EHN accept one-vs-all encoding for labels #410

Uh oh!

Conversation

glemaitre commented Mar 1, 2018

Uh oh!

glemaitre commented Mar 1, 2018

Uh oh!

codecov bot commented Mar 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

massich Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

massich commented Mar 19, 2018

Uh oh!

glemaitre commented Mar 19, 2018 via email

Uh oh!

glemaitre Mar 19, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 1, 2018 •

edited

Loading